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Simultaneous  Object  Classification  and 
Segmentation  with  High-Order 
Multiple  Shape  Models 

Federico  Lecumberry,  Student  Member,  IEEE,  Alvaro  Pardo,  Member,  IEEE, 
and  Guillermo  Sapiro,  Senior  Member,  IEEE 


Abstract — Shape  models  (SMs),  capturing  the  common  features 
of  a  set  of  training  shapes,  represent  a  new  incoming  object 
based  on  its  projection  onto  the  corresponding  model.  Given 
a  set  of  learned  SMs  representing  different  objects  classes, 
and  an  image  with  a  new  shape,  this  work  introduces  a  joint 
classification-segmentation  framework  with  a  twofold  goal.  First, 
to  automatically  select  the  SM  that  best  represents  the  object,  and 
second,  to  accurately  segment  the  image  taking  into  account  both 
the  image  information  and  the  features  and  variations  learned 
from  the  on-line  selected  model.  A  new  energy  functional  is 
introduced  that  simultaneously  accomplishes  both  goals.  Model 
selection  is  performed  based  on  a  shape  similarity  measure,  on¬ 
line  determining  which  model  to  use  at  each  iteration  of  the 
steepest  descent  minimization,  allowing  for  model  switching  and 
adaptation  to  the  data.  High-order  SMs  are  used  in  order  to  deal 
with  very  similar  object  classes  and  natural  variability  within 
them.  Position  and  transformation  invariance  is  included  as  part 
of  the  modeling  as  well.  The  presentation  of  the  framework  is 
complemented  with  examples  for  the  difficult  task  of  simultane¬ 
ously  classifying  and  segmenting  closely  related  shapes,  such  as 
stages  of  human  activities,  in  images  with  severe  occlusions. 

Index  Terms — Shape  priors,  image  segmentation,  object  mod¬ 
eling,  variational  formulations. 

1.  Introduction 

Object  segmentation  is  one  of  the  most  fundamental  tasks 
in  image  processing,  still  lacking  a  completely  automatic 
solution.  The  main  idea  is  to  find  a  set  of  features  that 
describes  and  discriminates  the  object  of  interest  from  the  rest 
of  the  image.  Object  color  is  a  low  level  feature  that  can  be 
used  as  such  descriptor,  although  its  discrimination  capacity 
is  often  insufficient  in  real  images.  Using  shape  as  a  high 
level  feature  is  a  common  approach  to  augment  such  low  level 
features. 

The  shape  of  the  desired  object  is  added  as  a  descriptor, 
constraining  the  set  of  possible  solutions  to  regions  of  the 
image  that  simultaneously  “match”  this  shape  and  the  low 
level  features  (color,  edges,  etc.).  The  most  common  way  to 
add  this  shape  information  is  in  the  form  of  a  weighted  linear 
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combination  of  functionals  addressing,  on  one  hand,  the  low 
level  features  and,  on  the  other  hand,  the  shape  priors  or  mod¬ 
els.  This  leads  to  a  minimization  problem  where  the  solution 
is  a  compromise  between  the  shape  of  the  final  contour  and 
the  information  given  by  the  image.  The  minimization  tech¬ 
niques  used  in  the  literature,  include,  among  others,  gradient 
descent  methods  [l]-[5]  and  graph-cuts  [6].  The  used  shape 
representations  can  be  signed  distance  functions  (SDF)  [l]-[3], 
[5],  [6],  quadratic  splines  [7],  characteristic  functions  [4],  and 
landmark  points  [8]. 

When  M  different  objects  (or  object  classes)  can  appear 
in  an  image,  a  single  shape  prior  (model)  is  not  sufficient, 
and  multiple  shape  priors  must  be  considered.  A  possible,  but 
not  elegant,  approach  is  to  run  the  process  with  each  one  of 
the  shape  priors  separately,  and  then  choose  the  best  solution. 
Vu  and  Manjunath  [6]  and  Cremers  et  al.  [5]  define  M 
possible  labels  for  each  pixel  on  the  image,  and  then  propose 
a  segmentation  energy  that  includes  the  optimization  of  these 
labels  in  order  to  determine  where  to  apply  each  prior.  In  a 
different  work,  Cremers  et  al.  [7]  perform  density  estimation 
in  a  non-linear  feature  space,  where  different  objects  are 
separable.  The  proposed  energy  is  then  minimized  considering 
both  the  curve’s  control  points  and  the  image. 

Considering  the  natural  deformations  and  the  variability  of 
objects  within  a  class,  high-order  shape  models  (SMs)  should 
be  included  in  the  segmentation.  Leventon  et  al.  [1]  compute 
PCA  on  a  set  of  registered  shapes,  fitting  a  Gaussian  prob¬ 
ability  distribution  to  the  coefficients  of  the  reconstruction. 
This  allows  to  include  the  probability  of  a  certain  shape,  in 
traditional  geodesic  active  contours  for  low  level  features, 
and  a  MAP  estimation  of  the  object  in  the  image.  Tsai  et 
al.  [3]  also  use  PCA  to  model  shape  variations,  defining  an 
energy  for  the  aligning  of  the  binary  shape,  and  formulate 
a  segmentation  functional  optimizing  the  parameters  of  the 
representation  with  the  first  deformation  modes.  Cootes  and 
Taylor  [8]  compute,  using  PCA,  a  point  distribution  model  of 
landmarks  points  defining  a  shape.  More  recently,  Charpiat 
et  al.  [4]  proposed  a  framework  to  compute  non-linear  shape 
statistics  based  on  the  Hausdorff  distance  between  shapes,  and 
then  model  distributions  similarly  to  [1]. 

In  this  work,  a  new  framework  for  image  segmentation  with 
multiple  high-order  shape  models  is  introduced,  addressing 
at  the  same  time  the  selection  of  the  model  and  its  image- 
driven  positioning  and  adjustment  to  the  modeled  deforma- 
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(d)  Modes  of  variation  for  class  2  (e)  Modes  of  variation  for  class  3  (f)  Modes  of  variation  for  class  4  (g)  Modes  of  variation  for  class  5 

Fig.  1.  (a)  Four  shapes  from  one  of  the  classes,  and  (b)  the  first  three  modes  of  variation  of  the  corresponding  model  in  the  walking  sequence.  The  thick 

black  line  is  the  mean  shape,  the  red  lines  are  obtained  varying  the  amplitude  (see  text),  (c)  Original  shape  (in  black)  and  its  projections  Vj^cf)  (colored  as  in 
Figure  2)  to  M  =  5  different  models  in  the  walking  sequence,  one  for  each  cluster.  The  mean  shape  of  the  corresponding  model  is  plotted  too  (black  curve). 
The  projections  are  ordered  based  in  the  measure  given  by  Equation  (8).  Note  how  the  projection  is  completely  deformed  when  using  the  wrong  shape  model, 
(d)  -  (g)  First  three  modes  of  variation  for  four  different  shape  models.  (This  figure  is  in  colors.) 


tions.  Invariance  is  included  as  part  of  the  framework  as  well. 
In  particular,  the  high-order  SMs  are  computed  using  PCA  in 
a  similar  way  as  [1],  [3],  obtaining  a  set  of  eigenmodes  of 
variations.  In  the  case  of  dynamic  shapes  with  large,  non-linear 
deformations,  a  method  to  obtain  a  lineal  approximation  of 
the  shape  space  is  described  using  a  dimensionality  reduction 
algorithm.  The  selection  of  the  model  is  obtained  with  a  binary 
selection  coefficient,  on-line  learned  based  on  a  similarity 
measure  between  shapes.  The  proposed  framework  follows 
from  a  functional  that  combines  two  terms.  The  first  one  is 
a  region-based  segmentation  term  [9].  The  second  term  is  a 
combination  of  the  multiple  high-order  SMs,  addressing  the 
model  selection  and  constraining  the  solution  to  the  high-order 
shape  information  coming  from  the  on-line  selected  model. 
While  the  framework  is  presented  for  planar  curves,  it  can  be 
easily  extended  to  data  in  higher  dimensions. 

The  remainder  of  the  paper  is  organized  as  following. 
Section  II  reviews  briefiy  the  definition  and  properties  of 
shapes  models.  Section  III  describes  the  proposed  framework. 
Section  IV  presents  experiments  testing  the  ideas  and  theirs 
discussion.  Section  V  proposes  an  invariance  to  translation 
extension  of  the  framework.  Finally,  Section  VI  concludes  the 
work. 

II.  High-order  multiple  shapes  models 

Consider  M  sets  <!>/.,  k  =  1, . . . ,  M,  each  with  reg¬ 
istered  shapes  =  {^^5  •  •  •  ?  where  each  0;^  is  a 

signed  distance  function  (SDE),  whose  zero  level-set  curve, 
represents  a  shape  from  the  k-th  class  of  objects.  Let 
be  a  d-order  model  that  captures  the  intrinsic  deformations 
of  the  training  set  for  the  class  k.  In  this  work,  is 
derived  from  a  PCA  decomposition  of  the  training  set  (all 
the  shapes  are  represented  as  vectors  in  D  being  the 
size  of  the  range  of  the  corresponding  SDEs), 

(1) 


where  jik  ^  is  the  mean  shape  of  G  is  a 

matrix  containing  the  first  d  modes  of  variation  (eigenmodes), 

Ut  =  [{ui}U],uiGR^. 

A  model  Mf.  generates  a  representation  of  a  new  incoming 
shape  (j)  by  the  d-projection  V^(j) 

Vt^  =  Mfe  +  Uiak,  (2) 

where  ak  G  are  the  corresponding  reconstruction  coef¬ 
ficients,  which  of  course  depend  on  (p  (see  for  example  [1, 
section  2.1]  for  details). 

The  accuracy  of  the  representation  depends  on  the  similarity 
between  cj)  and  the  shapes  in  T>/c.  Constraining  small  shape 
variations  in  the  class  (compared  with  the  deformations 
across  different  classes  k)  allows  to  obtain  accurate  represen¬ 
tations  using  a  linear  approximation  like  PCA. 

Finally,  let 

be  a  set  of  SMs  for  the  M  different  classes  of  objects.  For 
simplicity,  the  order  d  in  the  notation  is  omitted  from  now  on. 

Figure  1  shows  SMs  for  a  walking  person.  Figure  la  shows 
four  different  shapes  from  one  of  the  classes  of  shapes,  note  the 
similarity  between  them.  Figure  lb  shows  the  first  three  modes 
of  variation  of  the  corresponding  model  in  a  walking  sequence. 
The  data  was  obtained  filming  a  single  person  walking  with 
a  static  background  [10].  The  thick  black  line  is  the  zero 
level  set  of  the  mean  shape.  The  red  lines  are  the  zero  level 
sets  of  the  addition  of  the  mean  shape  and  a  constant  times 
the  first,  second,  or  third  eigenmode  respectively,  varying  the 
amplitude.  Figure  Ic  shows  an  original  shape  from  the  set 
and  its  projections  (with  d  =  21)  to  M  =  5  different  models 
in  a  walking  sequence.  The  mean  shape  of  the  corresponding 
model  is  plotted  too  (black  curve).  The  projections  are  ordered 
based  in  the  measure  given  later  by  Equation  (8).  Note  how 
the  projection  is  completely  deformed  when  using  the  wrong 
model,  clearly  illustrating  the  importance  of  selecting  the 
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(b) 

Fig.  2.  (a)  Low  dimensional  embedding.  Each  point  correspond  to  the  first 

three  coordinates  of  the  mapping  obtained  with  DM.  The  colors  correspond  to 
the  M  =  5  obtained  cluster.  One  sample  shape  from  each  cluster  (walking- 
cycle  position)  is  shown,  (b)  Eighteen  samples  from  the  walking  sequence 
colored  based  on  the  obtained  clusters.  (This  figure  is  in  colors.) 


correct  shape  model  (prior).  Figures  Id,  le,  If  and  Ig  show 
the  first  three  modes  of  variation  of  the  other  four  models 
obtained  from  the  same  walking  sequence.  The  procedure  to 
obtain  these  models  is  explained  in  the  next  section. 

A.  Clustering  a  set  of  shapes 

One  of  the  used  datasets  of  shapes  was  taken  from  the 
sequence  of  a  walking  person  [10].  Considered  as  a  unique 
deformable  object,  this  shape  has  large,  non-linear  deforma¬ 
tions,  invalidating  the  hypothesis  of  small  (and  linear)  shape 
variations  for  this  set.  To  alleviate  this,  a  set  of  clusters  may  be 
considered.  In  this  way  linear  approximations  can  be  used  to 
approximate  shape  deformations  within  each  cluster  to  obtain 
In  order  to  obtain  the  clusters,  in  this  work  a  non¬ 
linear  mapping  to  an  Euclidean  space  is  performed  based  on 
Diffusion  Maps  (dm)  [11].  DM  is  a  general  framework  for 
data  analysis  based  on  a  diffusion  process  over  an  undirected 
weighted  graph,  defining  a  new  metric  on  the  data  called 
Diffusion  Distance.  Two  properties  of  this  metric  are  important 
in  the  present  work.  First,  as  a  consequence  of  the  density 
renormalized  kernel  defined  to  build  the  graph,  the  graph- 
Laplacian  (see  VonLuxburg’s  tutorial  [12]  for  definition  and 
properties  of  the  graph  Laplacian)  is  an  approximation  of 
the  Laplace-Beltrami  operator  on  the  underlying  manifold, 
allowing  to  recover  the  Riemannian  geometry  of  the  data 
set  regardless  the  distribution  of  the  points  in  the  underlying 
manifold.  Second,  the  Diffusion  Distance  is  equivalent  to  the 
Euclidean  distance  in  the  space  with  coordinates  given  by  the 
mapping  function.  This  allows  to  simply  compute  K-means 
in  the  corresponding  Euclidean  space  in  order  to  group  the 
shapes  into  M  clusters  and  then  obtain  a  local  model  in  each 
cluster. 

To  recapitulate,  the  clusters  are  obtained  by  mapping  into  a 
new  space  via  DM  (a  kernel  method)  and  then  applying  K- 
means  on  this  space.  Note  that  the  subsequent  PCA  could 
actually  be  performed  in  this  space  as  well  (using  Kernel 
PCA  [13],  [14]),  though  the  clustering  makes  the  inner  class 


variations  already  well  approximated  by  ordinary  PCA. 

Figure  2  shows  the  clustering  result.  Figure  2a  shows  the 
low  dimensional  embedding  manifold.  Each  point  correspond 
to  the  first  three  coordinates  of  the  mapping  colored  based 
on  the  obtained  clusters.  One  sample  shape  from  each  cluster 
(walking-cycle  position)  is  shown  too.  Figure  2b  shows  eigh¬ 
teen  consecutive  samples  from  the  walking  sequence  colored 
based  on  the  obtained  clusters. 

III.  Proposed  variational  framework 

Given  an  input  image  T  :  (7  C  ^  M  containing  one 
or  more  shapes  generated  by  the  shape  models  an 

energy  E  is  defined  to  simultaneously  select  the  best  model(s) 
and  obtain  a  segmentation  of  the  corresponding  objects  in  X 
(a  single  object  in  each  image  is  considered  from  now  on  for 
simplicity), 

(AI*,  0*)  :=  arg  min  £’(T,  0,  c+,  c_,  TO).  (3) 
AlGaiT, 

0,C_|_,C_ 

This  energy  includes  two  terms  linearly  combined  with  the 
constant  A, 

-^(^5  05  c+5  C— ,  TO)  =  Ecy{X^  c+,  c_)  -h  A£’sm(05  TO).  (4) 

The  Ecy  term  is,  for  the  examples  in  this  paper,  the  energy 
introduced  by  Chan  and  Vese  [9], 


Ec\{Xj  05  c_) 


[  \X{x)  -  c+\'^ H {(j){x))dx  + 

Jn 

[  \Xix)  -  -  H{(l){x)))dx  + 

Jn 

M  [  S{(p{x))\V(p{x)\dx, 

Jn 


(5) 


where  c+  and  c_  are  the  averages  of  the  input  data  inside 
and  outside  the  curve  C  (the  zero  level  set  of  0),  respectively, 
Ff(-)  is  the  Heaviside  function,  and  J(-)  is  the  Dirac  function. 
This  energy  attempts  to  split  the  input  data  into  two  different 
regions  of  approximately  piecewise  constant  color  or  gray  level 
values  (c+  and  c_).  Other  low  level  descriptors  could  be  used 
for  a  better  discrimination,  for  example  texture  [15]  or  edges 
[16]. 

The  term  Esm  adds  an  additional  force  aiming  at  maxi¬ 
mizing  the  similarity  between  the  evolving  shape  0  and  its 
projection  onto  only  one  of  the  d-order  models  from  TO.  Which 
one  of  the  M  models  is  used  depends  on  the  evolution  of  the 
shape  and  its  projection  to  each  model.  The  proposed  term  is 


M 


E, 


,{cf>,m)  =  y2i3kEkicp,M^), 


(6) 


k=l 


defining 


-  H{r,Hp))fdp,  (7) 
Jn 

where  again  Hf)  is  the  Heaviside  function,  f3k  is  a  binary 
coefficient  that  (on-line)  selects  which  of  the  M  models  is 
used,  and  is  the  projection  of  0  onto  the  model 
given  by  Equation  (2).  Only  one  of  the  Pk  must  be  different 
from  zero  in  (6),  since  it  is  not  fair  to  penalize  for  models  that 
do  not  correspond  to  the  object  in  the  image.  This  is  detailed 
next. 
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Fig.  3.  (a)  Mode  of  variation  for  the  two  ellipses  models  in  green  and  AdJ  in  red),  the  mean  shape  of  both  models  is  the  same  and  is  plotted  in 

black  dash  line,  (b)  Results  for  the  experiments  with  M.\  and  Ady  (only  mean  shape).  Some  steps  in  the  segmentation  (see  text)  (c)  Evolution  of  the  shape 
dissimilarity  measure  for  the  experiments  with  AdJ  and  Ady.  (d)  Results  for  experiments  with  AdJ  and  Ady  (complete  model),  (e)  Evolution  of  the  shape 
dissimilarity  measure  for  the  experiments  with  AdJ  and  Ady.  (This  figure  is  in  colors.) 


A.  Shape  dissimilarity  measure  and  model  selection 

Which  is  the  non-zero  fSk  in  Equation  (6)  is  computed  based 
on  a  shape  dissimilarity  measure  (T)  between  two  shapes 
and  02, 


r{4>i,4>2)=  [ 
Jn 


I</>1(P)I<^(</>2(P)) 

length(C2) 


/ 

Jn 


I  'rngthpo 

(8) 

This  is  a  length-normalized  variation  of  the  measure  intro¬ 
duced  by  Funkhouser  et  al.  [17].  This  measure  evaluates 
the  sum  of  Euclidean  distances  corresponding  to  moving  the 
contour  of  the  first  shape  to  points  in  the  contour  of  the 
second  shape,  and  viceversa,  scaled  by  the  curves  lengths. 
In  Figure  Ic,  the  projected  shapes  are  ordered  according  to 
increasing  values  of  T (0,  These  ordered  values  are  1.35, 
2.83,  3.59,  5.87,  and  7.83  respectively. 

Based  on  (8),  a  normalized  shape  similarity  measure  f/c(0) 
between  a  shape  0  and  its  projection  V^cj)  to  the  d-order  k-\h 
model  is  computed  as 


=  ^^^-^7777’  ’^here  =  exp  (-T(^,  . 

(9) 

This  normalized  similarity  measure  ^/c(0)  is  close  to  one  for 
the  model  that  better  represents  0.  Finally  to  force  the  binary 
value  in  (3k,  soft  thresholding,  based  on  a  sigmoid  function, 
is  performed.  Note  that  a  unique  coefficient  is  used  as  model 
selector,  instead  of  one  coefficient  in  each  pixel  as  in  [5],  [6]. 
This  encourages  shape  consistency  and  significantly  simplifies 
the  optimization. 

With  the  proposed  method,  one  model  is  always  selected 
and  a  segmentation  is  obtained,  even  if  the  shape  in  the 
image  has  no  appropriate  model  in  971  that  provides  a  good 
representation.  The  validation  of  the  final  segmentation  can 
not  be  directly  compared  to  the  original  non-occluded  shape 
in  all  the  cases,  since  there  is  no  way  to  “create”  the  particular 
features  or  attributes  that  are  occluded  in  it.  Instead  of  this, 
the  resulting  segmentation  is  evaluated  taking  into  account 


the  fact  that  the  shape  is  generated  by  a  model  in  971 
and  the  solution  should  then  be  a  “valid”  shape  generated 
by  this  model.  The  following  measure  permits  to  discard  a 
segmentation  0  given  the  selected  model.  First,  the  mean  T/. 
and  variance  of  T(0;^,  /i/^)  are  computed  V0;^  ^  ^/c-  Then 
if 

TA/Xfe)>Tfe  +  1.5  4,,  (10) 

the  segmentation  is  discarded,  and  the  shape  can  not  be 
recognized. 


B.  Energy  minimization 

The  proposed  energy  is  minimized  using  a  classical  gra¬ 
dient  descent  method.  For  the  gradient  descent  of  F^cv,  the 
expression  is  given  in  [9,  Equation  (9)] 

For  F^sm,  the  obtained  expression  is 


9F^sm 

90 


M 

-2^/3fe  \\H{cP)  -  H{r„cP)\\ (S{cp)  -  6{r„cP)w), 

k=l 


where  W  =  UkU'^ .  Although  the  model  selector  (3k  depends 
of  0,  is  treated  as  static,  as  a  first  order  approximation  for  the 
gradient  descent,  since  it  affects  the  model  selection  and  only 
indirectly  the  evolution  of  the  curve. 

Finally  the  first  variation  of  Equation  (4)  becomes, 

dE  _dEcy  ,  ,9f;sm 

d(l)  dcP  d(l)  '  ^  ^ 


C.  Prior  activation 

The  first  steps  of  the  optimization  are  performed  without 
SMs  information  (A  =  0),  until  stationarity,  then  the  “prior  is 
activated”  adding  F^sm  with  A  7^  0  (manually  selected)  until 
a  new  stationary  point  is  reached,  now  combining  the  image 
and  the  shape  information.  This  helps  to  determine  the  object 
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in  the  image  without  affecting  the  initial  steps  of  the  evolution 
with  the  projections  to  the  models  of  the  initial  curve  used  in 
the  minimization,  which  in  the  general  has  no  similarity  with 
the  shapes  in  the  models.  A  similar  idea  of  “prior  activation” 
is  considered  by  Vu  and  Manjunath  [6]  using  shape  prior 
templates  instead  of  SMs. 

The  Gestalt  Principles  [18]  can  give  some  intuition  to  this 
initial  step.  The  “Principle  of  Similarity”  states  that  people  try 
to  organize  visual  elements  into  groups  based  in  the  similarity 
of  certain  feature  (shape,  intensity,  texture,  etc.).  This  gives 
an  additional  argument  for  trying  to  start  grouping  regions 
of  similar  intensity  and  use  the  results  as  an  initial  point  or 
“primary  units”  for  helping  the  minimization  process.  After  the 
identification  of  these  “primary  units,”  the  addition  of  priors 
is  used  for  a  better  interpretation  of  the  object  or  scene. 

IV.  Experimental  results 
A.  Models  of  ellipses 

The  first  example  is  a  “toy  example,”  though  illustrative 
and  challenging,  with  two  models  of  ellipses,  where  the 
only  difference  is  that  the  first  (and  only  beyond  the  mean 
shape)  eigenmode  is  rotated  |  (this  already  exemplifies  the 
importance  of  high-order  models).  Let  us  name  M.y  the  model 
with  vertical  deformations  and  M.\  the  model  with  horizontal 
deformations.  Figure  3a  shows  the  mode  of  variation  for  both 
models,  in  green  for  and  in  red  for  M\. 

The  input  image  contains  an  occluded  vertical  ellipse, 
not  present  in  the  training  set.  Two  different  experiments 
are  presented,  varying  the  order  d  of  the  model  My  while 
maintaining  the  highest  dimension  for  the  model  that  does 
not  represent  the  input  shape,  M\.  With  d  =  0,  only  the 
mean  shape  is  considered  in  the  shape  prior  (no  deformations), 
with  d  =  1  the  vertical  deformations  are  considered.  All  the 
parameters  are  the  same  in  both  experiments.  Figures  3b  and 
3d  show  some  steps  in  the  minimization,  and  figures  3c  and  3d 
show  the  evolution  of  the  shape  dissimilarity  measure,  for  both 
experiments,  respectively.  Steps  ©  and  ©  show  an  intermediate 
curve  in  the  evolution  with  A  =  0,  and  the  projections,  V^^cj) 
and  Vy(j),  to  both  models,  dashed  colored  lines.  The  initial 
curve  (in  yellow)  is  also  shown.  Note  that  Vyf  has  no  vertical 
deformations.  The  following  steps  (©,©  and  ©,©)  show  the 
evolution  after  the  “prior  activation”  adding  the  F^sm  term 
(A  =  1.1),  and  the  obtained  segmentation  (@  and  ®). 

In  the  first  experiment  (Figure  3b),  the  projections  to  both 
models  end  in  the  same  shape,  the  mean  shape.  This  is 
refiected  also  in  the  graph  of  dissimilarity  measure  (Figure  3c) 
by  the  overlapping  of  the  green  and  red  curves.  In  the  second 
experiment  (Figure  3d),  M^  captures  the  variation  of  the  input 
shape,  as  refiected  in  the  obtained  segmentation.  In  this  case 
there  is  also  a  model  switching  around  iteration  200  (step 
®),  where  the  M\\^  selected  while  the  occlusions  are  being 
filled.  After  this  point,  the  vertical  deformation  determines  the 
selection  of  M\  for  the  rest  of  the  evolution,  ending  with 
an  accurate  segmentation.  Clearly,  the  high-order  model  and 
the  automatic  model  selection  are  critical  to  obtain  the  correct 
segmentation. 


(C) 


Eig.  4.  (a)  Input  image  with  an  occluded  shape  (pi  in  gray  levels,  (b) 

Projections  of  (p  in  the  “prior  activation”  iteration  (blue  curve  in  step  ®  in 
Figure  4c)  onto  the  five  models,  ordered  based  on  The  mean 

shape  of  the  corresponding  model  is  plotted  too  (black  curve),  (c)  Steps  ® 
to  ®  in  the  evolution  of  p  (blue  curve)  and  its  projections  onto  the  selected 
model  A4wi  (green  curve).  The  obtained  segmentation  is  the  red  curve,  (d) 
Evolution  of  the  shape  dissimilarity  measure,  T(0,  V^^p)  with  the  iterations. 
The  curves  in  steps  are  shown  in  Figure  4c.  (This  figure  is  in  colors.) 


B.  Models  from  the  walking  sequence 

Five  high-dimensional  models  of  a  walking  person  cycle 
=  l,...,5,(i  =  21)  were  obtained  with  the  procedure 
explained  in  Section  II-A.  The  first  three  modes  of  variation  for 
each  model  are  shown  in  Figure  1 .  These  are  the  models  in  the 
set  of  models  dK  =  for  the  next  experiment.  This 

set  of  models  is  particularly  challenging  for  model  selection 
since  they  are  different  deformations  of  the  “same  object.” 

The  input  image  in  this  experiment  contains  a  new  occluded 
shape  (Figure  4a)  that  belongs  to  the  model  and  is 
not  in  its  training  set  ^i.  Figure  4  shows  details  about  the 
segmentation  of  fi.  Figure  4c  shows  four  steps  after  the  “prior 
activation”  (steps  ©  to  @)  in  the  evolution  of  (j)  (blue  curve) 
and  their  projections  onto  the  automatically  on-line  selected 
model  (gfeen  curve).  Also  the  obtained  segmentation 
(red  curve)  and  its  projection  is  shown.  Figure  4b  shows  the 
projections  of  f  (blue  curve)  in  the  “prior  activation”  iteration 
onto  the  five  models,  ordered  based  on  T((/),  7^^^0)  for  this 
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iteration.  The  mean  shape  of  the  corresponding  model  is  plot¬ 
ted  too  (black  curve).  Compare  the  projections  of  the  occluded 
shapes  (Figure  4b)  with  those  of  a  similar  non-occluded  shape 
in  Figure  Ic.  Note  how  the  projections  onto  the  incorrect 
models  are  not  too  different,  but  the  projections  onto  the 
correct  model  have  significative  differences.  Figure  4d  plots 
the  evolution  of  the  shape  dissimilarity  measure,  T((/), 
for  all  the  iterations  and  the  five  models.  Note  how  the  correct 
model  is  the  one  with  lowest  dissimilarity  measure. 

This  experiment  is  repeated  four  times,  maintaining  the 
same  set  of  models  OT  =  and  changing  the 

input  image.  In  each  repetition,  the  input  image  contains  a 
new  occluded  shape  (j)k,  k  =  2, 3, 4, 5,  belonging  to  the 
models  k  =  2,3,4,  5,  respectively.  These  four  images 

are  shown  in  Figure  5  a  with  the  corresponding  obtained 
segmentations  (pk  (red  curves)  and  the  projections  onto  the 
corresponding  selected  model  (dashed  green  curves). 

The  results  in  Figure  4  show  a  number  of  important  char¬ 
acteristics  of  the  proposed  framework  that  are  consistent  for 
all  the  presented  experiments.  First,  in  all  the  examples  the 
selected  model  is  the  one  to  which  the  input  shape  belongs  and 
the  obtained  segmentation  is  accurate  to  the  data  given  by  the 
image.  Also,  for  all  the  experiments,  during  the  minimization 
iterations  the  model  selection  is  stable  and  there  is  no  switch 
between  the  models  once  the  shape  prior  is  activated. 

Second,  it  is  relatively  easy  to  follow  the  variations  of 
the  projection  in  the  shape  dissimilarity  measure  graph  as 
the  curve  evolves.  When  the  occlusions  are  being  filled,  the 
projection  gets  more  similar  to  the  shapes  m  and  the 

dissimilarity  T(0, 7^^^0)  reduces.  This  is  due  to  the  force 
generated  by  the  shape  term  and  as  the  curve  gets  closer 
to  its  projection  this  term  attracts  the  curve  strongly.  Although 
this  behavior  is  due  to  the  F^sm  term,  the  competition  of  both 
energy  terms  in  areas  of  the  shape  where  there  is  no  occlusion 
preserves  the  curve  close  to  the  contour  of  the  original  shape, 
preventing  to  locally  follow  the  projection,  meaning  that  the 
Eqy  energy  term  is  stronger  than  the  prior  in  this  area  of  the 
image.  This  can  be  seen  in  the  final  segmentation  (red  curve 
in  Figure  4c),  where  the  projection  (dashed  green  curve)  in  the 
hand  goes  through  the  original  shape  but  the  curve  respects  the 
gray  level  information.  Similar  details  can  be  seen  in  Figure  5a 
for  a  different  example.  This  example  shows  how  the  two 
energy  terms  collaborate  to  obtain  a  good  segmentation  of 
the  occluded  shape  and  each  term  attempts  to  define  the  curve 
in  the  regions  where  it  better  describes  the  solution.  Where 
there  is  an  occlusion  the  shape  prior  term  takes  control  of 
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TABLE  I 

Numerical  validation  results  (Equation  (10))  and  dissimilarity 
MEASURES  (Equation  (8))  lor  the  obtained  segmentations  eor 
THE  EXPERIMENT  IN  EIGURES  4  AND  5. 


(b) 


Fig.  5.  Segmentations  obtained  with  the  proposed  framework  with  the  set  of 
models  =  for  different  input  images,  (a)  Segmentations  for 

the  binary  occluded  shapes  (j)2,  03 ,  04  and  05  belonging  to  different  models 
,  A;  =  2,  3, 4,  5,  respectively,  (b)  Segmentations  of  the  gray  level  images 
with  added  occlusions.  The  shapes  in  these  images  also  belong  to  different 
models  ^  which  are  all  correctly  selected  by  the  proposed  framework. 
(This  figure  is  in  colors.) 

the  curve  and  where  there  is  information  of  the  actual  shape 
(given  by  the  intensity  of  the  pixels),  the  data  term  controls 
the  curve.  This  is  done  in  a  collaborative  way,  there  is  no 
discontinuity  in  the  curve  and  it  remains  smooth.  In  order  to 
achieve  this,  the  projection  onto  the  proper  model  is  critical. 
Also  the  selection  of  the  parameter  A  is  important,  determining 
these  collaboration/competition  between  both  energy  terms. 
In  this  work,  as  often  done  in  the  literature,  A  is  manually 
obtained.  As  a  rule  of  thumb  A  G  (1.0, 1.2)  was  found  to  be 
a  good  initial  estimation. 

Table  I  shows  numerical  results  for  the  validation  of  the 
obtained  segmentation  (see  Equation  (10)).  The  dissimilarity 
between  the  obtained  shape  and  the  original  non-occluded 
shape  T(0/c,(/)/c)  (possible  in  these  experiments  since  it  is 
accessible)  is  shown  in  the  last  column.  Note  that  these  last 
measures  are,  in  general,  significantly  smaller  than  the  mean 
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7.48 
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8.81 
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TABLE  II 

Dissimilarities  measures  (Equation  (8))  between  the  obtained 

SEGMENTATIONS  0/^  AND  THE  MEAN  SHAPES  OE  THE  MODELS  IN 
EOR  THE  EXPERIMENT  IN  EIGURES  4  AND  5 . 
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Fig.  6.  Obtained  segmentations  with  different  orders  of  the  shape  model,  (a)  Set  of  shapes  used  in  the  experiment,  the  boxed  shape  is  (f)s-  (b)  First  three 
modes  of  variation  of  the  model  (c)  Segmentation  of  0s  (an  occluded  version  of  0s)  when  0s  is  not  in  the  training  set  (left  to  right  and  top  to  bottom 

d  =  [3,  7, 10, 13]).  (d)  Segmentation  of  0s  when  0s  is  in  the  training  set  (left  to  right  and  top  to  bottom  d  =  [3, 10, 13, 14]).  (This  figure  is  in  colors.) 


dissimilarity  between  the  shapes  in  the  training  set  and  the 
mean  shape,  T/..  This  indicates  the  high  accuracy  of  the 
proposed  framework  for  this  data. 

Table  II  shows  the  dissimilarity  measures  between  the 
obtained  segmentations  and  the  mean  shapes  for  all  the  models 
in  TO,  for  the  five  different  images  and  shapes  in  figures  4 
and  5,  the  minimimum  for  each  is  in  bold  in  each  row, 
and  obtained  in  the  diagonal  as  expected  from  a  correct 
model  selection.  Note  how  the  difference  between  the  minima 
and  the  next  greater  value  in  each  row  are  considerable. 
This  further  indicates  how  the  automatic  model  selection  is 
correctly  performed.  Taking  into  account  that  the  obtained 
segmentations  correspond  to  shapes  generated  by  the  selected 
models,  in  all  the  experiments,  the  results  were  validated  with 
the  proposed  measure.  Equation  (10),  and  were  not  validated 
by  the  other  four  used  models.  This  further  supports  the 
validity  of  the  proposed  framework  in  general  and  the  on-line 
automatic  selection  of  the  correct  model  in  particular. 

Figure  5b  shows  the  obtained  segmentations  with  the  pro¬ 
posed  framework  for  four  different  gray  level  images.  The 
configuration  of  the  framework  is  the  same  as  in  the  previous 
examples,  using  the  set  of  models  TO  =  The 

automatically  selected  models,  as  well  as  the  obtained  seg¬ 
mentations,  are  also  correct  and  accurate. 


C.  Varying  the  order  of  the  models 

This  section  further  analyzes  the  segmentations  when  the 
order  d  of  the  model  varies.  The  shapes  used  in  this  test  are 
shown  in  Figure  6a.  They  are  fifteen  shapes  of  sharks  taken 
from  the  SQUID  database  [19].  Two  different  sets  of  shapes 
are  defined,  and  ^S2-  ^si  has  fourteen  shapes,  leaving 
out  the  shape  marked  with  a  box  in  Figure  6a,  while  $S2  uses 
the  fifteen  shapes.  Two  different  models  were  created, 
from  ^si  and  Since  is  larger  than  4>si, 

Af  might  have  more  maximal  modes  of  variation  than  Alf^ , 
this  happen  in  this  case  being  d  =  14  the  number  of  modes  of 
variation  for  and  d  =  13  for  Alfi  •  Figure  6b  shows  the 
first  three  modes  of  variation  of  Alf^ .  The  modes  of  variation 
of  Afg^  are  very  similar  to  those  of  Af^^- 

Inspite  of  the  “visual”  similarity  of  the  shapes  in  the  set, 
their  variations  are  larger  than  in  the  previous  examples.  For 
instance,  they  are  not  just  a  sampling  of  the  deformation  of 
an  object  like  the  walking  sequence  or  the  ellipses.  This  can 
be  observed  from  the  mean  dissimilarity  measure  between  the 
shapes  in  the  training  set  and  the  mean  shape,  T.  For  the 
models  this  value  is  close  to  6  (see  tables  Ilia  and  Illb) 
whereas  for  the  models  Al^^  is  smaller  than  4  (see  Table  I). 
This  is  a  significant  difference  for  this  dissimilarity  measure. 
(Note  that  the  dissimilarities  can  be  compared  since  they  are 
normalized  by  their  corresponding  curve  length.) 

In  order  to  analyze  the  segmentations  varying  the  order  of 


the  model,  at  first  a  single  model  is  used  in  OK  at  a  time, 
without  the  infiuence  of  the  model  selection  component  of  the 
framework. 

The  first  experiment  consists  of  the  segmentation  of  an 
input  image  with  an  occluded  version  of  the  boxed  shape  in 
Figure  6a,  with  the  model  with  different  order  d  (number 
of  modes  of  variation).  Let  (/)s  be  the  original  shape,  0s  its 
occluded  version,  and  0s  ^  the  obtained  segmentation  with  the 
d-order  model.  Figure  6c  shows  the  obtained  segmentations 
0s^  (red  curves)  for  d  =  [3,  7, 10, 13].  The  projection  to  the 
model  is  also  plotted  (dashed  green  curves). 

This  experiment  is  repeated  using  the  model  .  Figure  6d 
shows  the  obtained  segmentations  for  d  =  [3, 10, 13, 14]  and 
the  corresponding  projections  onto  the  model. 

Table  Ilia  shows,  for  the  model  the  dissimilarity 

measure  between  the  obtained  segmentations  0s  ^  for  different 
d  and  the  original  non-occluded  shape,  T(0s^,0s),  and  the 
dissimilarity  measure  with  respect  to  the  mean  shape  /isi, 
T(0Sd:Msi)-  Table  Illb  shows  the  same  results  for  the  model 
Afs2  the  mean  shape  /iss- 

As  can  be  observed  in  both  experiments,  the  projection 
better  represents  the  shape  as  the  order  increases. 

In  the  first  experiment  (Figure  6c),  the  obtained  segmenta¬ 
tion  improves  the  adjustment  to  the  shape  as  the  model  has 
more  details  to  represent.  This  can  be  seen  for  example  in 
the  pectoral  and  tail  fin  and  under  the  head.  However,  since 
the  projection  does  not  perfectly  adjust  to  the  object,  there 
is  a  competition  between  both  energy  terms,  generating  an 
intermediate  curve  that  does  not  completely  fit  the  present 
object.  If  more  weight  is  added  to  the  E^m  term,  other  regions 
of  the  curve,  in  non-occluded  areas  of  the  object,  will  follow 
a  less  accurate  approximation  of  the  projection  and  lead  to 
a  worst  segmentation,  for  example  in  the  belly  of  the  shark. 
Trying  to  choose  the  best  segmentation  from  this  four  cases 
of  d,  the  curve  obtained  with  d  =  10  seems  to  be  the  slightly 
more  accurate  than  the  curve  obtained  with  d  =  13,  for 
example,  analyzing  the  adjustment  in  the  tail  and  the  pectoral 
and  pelvic  fins.  This  is  also  support  by  the  dissimilarity 
between  the  obtained  segmentations  0s  ^  and  the  original  non- 
occluded  shape,  last  column  of  Table  Ilia.  This  provides  an 
example  of  a  kind  of  over-fitting  of  the  model  to  the  shapes 
in  the  training  set,  capturing  features  too  specific  in  the  higher 
eigenmodes. 

On  the  other  hand,  with  the  model  that  includes  the 
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Numerical  validation  results  (Equation  (10))  and  dissimilarity 
MEASURES  (Equation  (8))  lor  the  obtained  segmentations 

SHOWN  IN  EIGURE  6. 
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Fig.  7.  Evolution  of  the  shape  dissimilarity  measure,  T((/),  (j)),  with  the 

minimization  iterations.  (This  figure  is  in  colors.) 

non-occluded  shape  in  the  training  set  (Figure  6c),  the  E^m 
shape  term  has  more  relevance  in  the  segmentation.  When 
the  order  of  the  model  increases,  the  projection  gets  more 
accurate  and  the  segmentation  improves.  When  the  order  is 
low  and  the  projection  is  not  accurate,  the  segmentation  again 
is  a  compromise  between  the  two  energy  terms,  being  an 
intermediate  curve  between  the  projection  and  the  edges  of  the 
gray  level  information.  Finally,  the  main  difference  between 
the  segmentations  with  d  =  13  and  d  =  14  are  the  fine  details 
like  high  curvature  points,  see  the  extreme  points  in  the  tail 
and  the  pectoral  fin. 

The  last  experiment  of  this  section  is  done  using  both  mod¬ 
els  in  the  set  9K  =  and  the  same  occluded  shape 

in  the  input  image.  Figure  7  plots  the  dissimilarity  measure 
for  this  example.  The  selected  model  is  which  has  one 
additional  eigenmode  and  obtains  a  better  description.  The 
obtained  segmentation  is,  of  course,  the  same  segmentation 
shown  in  Figure  6d  with  d  =  14.  This  further  supports  the 
necessity  of  high-order  models  in  order  to  obtain  accurate 
segmentations,  in  particular  when  the  different  object  classes 
are  relatively  similar. 

These  obtained  segmentations  are  also  validated  by  the 
proposed  validation  process,  (Equation  (10)). 

V.  Invariance  to  translation 

Invariance  to  geometric  transformations  (such  as  transla¬ 
tions,  rotations  and  scaling)  is  a  desiderable  property  in  a 
general  framework  for  segmentation.  One  way  to  do  this  is 
to  substitute  77(0)  by 

H{aRe{(l){x  -  xq))) 

in  Equation  (7),  as  in  the  work  of  Cremers  et  al.  [2,  section 
5.1].  Here,  a  is  a  scale  factor,  Rg  a  rotation  matrix  of  a  given 
angle  0,  and  xq  a  translation  vector. 

This  section  proposes  an  extension  of  the  functional  in 
Equation  (7),  adding  invariance  to  translation  (other  invariance 
are  similarly  added). 

Consider  all  the  shapes  aligned  with  respect  to  their  cor¬ 
responding  center  of  mass  defined  for  a  certain  shape  0 
as 

^  /nPg(</>(p))dp 

The  shape  models  are  build  in  the  same  way  as  in  the  previous 
section.  Considering  that  all  the  shapes  in  4>/c  have  the  same 
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Fig.  8.  Segmentations  obtained  with  the  translation  invariant  energy  (see  Equation  (13)).  (a)  Four  zero-order  models  (shape  prior  templates),  (b)  Four  different 
initial  curves  (yellow  curves)  and  the  obtained  segmentations  (red  curves),  (c)-(d)  Two  different  initializations  with  Gaussian  noise  added.  Initial  curve  (in 
yellow),  last  curve  previous  to  the  “prior  activation”  (in  blue)  and  obtained  accurate  and  valid  segmentation  (red  curve). 


center  of  the  mass  pk,  this  point  becomes  the  center  of  mass 
of  the  model.  To  obtain  the  projection  of  a  new,  not  aligned, 
shape  (j)  with  center  of  mass  p^f^,  first  the  shape  is  translated 
to  pk  and  then  projected.  Mathematically,  the  projection  to 
the  translation  invariant  model  centered  at  pk,  becomes 

~  Pk)^  and  lets  call  p^  its  center  of  mass.  Finally, 
the  projection  needs  to  be  translated  back  to  the  original  center 
of  mass.  Defining  T(0(p),po)  =  the  final  projection 

to  the  translation  invariant  model  is 


pfV  =  T{P,T{4>{P),P4>-Pk),pl-P4>)-  (12) 


Without  loose  of  generality  pk  =  0  is  assumed  from  now  on. 

In  order  to  incorporate  the  invariance  to  translation  in  the 
original  energy,  the  shape  models  terms  become 


Ek{4>,Mk)=  [  \\Hict>ip  +  p^))-H{V,cP{p+p^))f  dp. 

JQ 

(13) 

If  the  kth  model  provides  a  good  representation  of  0,  the 
corresponding  centers  of  mass  are  close,  ~  P/c  =  0.  For 
the  derivation  of  the  corresponding  gradient  descent  expression 
below,  this  approximation  is  assumed,  simplifying  the  deduc¬ 
tion  here  presented.  However  in  the  implementation  the  actual 
p^  is  used.  The  updated  gradient  descent  expression  is  given 
by 


dE, 


M 


k=l 


=  -2j2Pk  AH{p)A5{p)+ 


S{Hp))iP  -  P<I>V 


AH{z)AS{z)V(j){z)dz  , 


(14) 


where  AH(p)  =  ( H(4>(p))  -  H  and  AS(p)  = 

{S{cP{p))  -S{T{Wr,T{cf>{p),p^),-p^))) 

Again,  f3k  is  treated  as  static,  as  a  first  order  approximation 
for  the  gradient  descent. 


A.  Model  selection  with  invariance  to  translation 

An  example  of  the  model  selection  capabilities  of  the  trans¬ 
lation  invariant  framework  is  shown  in  Figure  8.  In  order  to 
test  only  the  model  selection,  without  being  infiuenced  by  the 


adjustment  to  the  selected  model,  four  zero-order  models  are 
used.  A  zero-order  model  obtains  always  the  same  synthesized 
shape  for  any  input  shape,  this  synthesized  shape  being  the 
mean  shape  of  the  model  in  this  work.  The  zero-order  models 
are  shown  in  Figure  8a.  These  four  different  shapes  from  the 
SQUID  database  are  arranged  in  a  single  image  with  occlusions 
for  each  shape  and  this  becomes  the  input  image  for  testing 
the  framework.  Figure  8b  shows  four  different  initial  curves 
(in  yellow)  and  the  segmentation  (red  curves)  obtained  with 
the  proposed  framework.  Two  examples  with  Gaussian  noise 
added  to  the  image  are  shown  in  Figures  8c  and  8d. 

In  all  the  cases  the  segmentations  are  accurate  which  also 
implies  that  the  selected  model  is  the  correct  one.  Note  that 
these  results  are  valid  even  when  the  initial  curves  are  not 
clearly  defining  one  object  (following  our  definition  of  validity. 
Equation  (10)). 


B.  Segmentations  with  invariance  to  translation 

The  last  tests  show  the  translation  invariant  framework 
working  with  the  high-order  models.  The  first  experiment 
reproduces  the  test  with  the  ellipses  (Section  IV-A),  now 
with  the  addition  of  Gaussian  noise  and  the  translation  of 
the  ellipse.  Figure  9a  shows  the  details  of  the  segmentation. 
The  first  subfigure  shows  the  initial  curve  (in  yellow),  an 
intermediate  step  (blue  curve),  and  its  projections  to  both 
models  (green  and  red  dashed  curves).  Note  the  projections 
translated  to  the  center  of  mass  of  (j).  The  second  subfigure 
shows  the  projection  to  the  correct  model  and  the  curve 
filling  the  occlusions.  The  last  subfigure  shows  the  obtained 
segmentation  (red  curve). 

Figure  9b  shows  the  segmentation  of  an  image  with  an 
occluded  binary  shape  from  the  walking  sequence  and  Fig¬ 
ure  9c  shows  the  segmentation  of  an  image  with  a  gray  level 
shape  from  the  walking  sequence,  using  the  set  of  translation 
invariant  models  for  this  dataset.  Again  the  result  is  an  accurate 
segmentation  with  a  valid  shape  from  the  correctly  selected 
model. 
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(a)  (b)  (c) 


Fig.  9.  Segmentations  obtained  with  the  translation  invariant  energy  (see  Equation  (13)).  (a)  Three  steps  in  the  segmentation  of  the  ellipse  from  Figure  3. 
Note  in  the  first  image  the  projections  translated  to  the  center  of  mass  of  0.  The  second  image  shows  the  projection  to  the  correct  model  and  the  curve  hlling 
the  occlusions.  The  last  image  shows  the  obtained  segmentation  (red  curve),  (b)  Obtained  segmentation  of  a  binary  shape  from  the  walking  person  cycle,  (c) 
Obtained  segmentation  of  a  gray-valued  shape  from  the  walking  person  cycle.  (This  figure  is  in  colors.) 


VI.  Concluding  remarks 

A  framework  for  simultaneous  and  automatic  model  selec¬ 
tion  and  object  segmentation  was  introduced  in  this  work.  The 
proposed  technique  is  based  on  a  new  energy  that  combines 
region  based  segmentation  with  on-line  selection  of  the  best 
model  for  the  object  present  in  the  image,  and  an  adjustment 
to  the  best  description  of  the  object  given  the  selected  model. 

The  segmentation  is  obtained  via  gradient  descent  energy 
minimization,  and  the  model  selection  is  automatic  in  each 
iteration,  without  the  need  to  run  the  segmentation  with  all  the 
models  and  then  select  the  best  solution.  The  on-line  decision 
of  best  description  is  based  on  a  shape  dissimilarity  measure 
between  the  curves.  The  selection  is  such  that  a  unique  model 
candidate  is  considered  at  each  step  of  the  minimization. 
Invariance  to  shape  transformations  are  incorporated  into  the 
proposed  framework  as  well. 

Possible  directions  for  further  improvements  include  incor¬ 
porating  high-order  modes  in  the  validation  step  and  consid¬ 
ering  going  beyond  PCA,  as  well  as  including  class -dependent 
model  orders  (dk).  Results  in  these  directions  will  be  reported 
elsewhere. 
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